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ABSTRACT 

The  determinination  of  the  number  of  signals  in  a 
wide  class  of  problems,  including  array  processing,  har¬ 
monic  retrieval  and  pole  retrieval,  is  addressed.  A  new 
approach,  based  on  the  application  of  the  information 

theoretic  criteria  for  model  identification  introduced  by 

Akaike.  Schwartz  and  Rissanen,  is  presented.  It  is  shown 
that  the  criterion  introduced  by  Schwartz  and  Rissanen 
yields  a  consistent  estimate  of  the  number  of  signals, 
while  the  criterion  introduced  by  Akaike  yields  an  incon¬ 
sistent  estimate  that  tends,  in  the  large-sample  limit,  to 
overestimate  the  number  of  signals. 
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introduced  in  section  III.  The  application  of  these  criteria 
to  the  problem  of  determining  the  number  of  signals 
from  the  multiplicity  of  the  smallest  eigenvalue  of  the 
covariance  matrix  is  presented  in  section  III.  The  con¬ 
sistency  of  these  criteria  is  discussed  in  section  IV.  it  is 
shown  that  the  criterion  introduced  by  Schwartz  and  Ris¬ 
sanen  yields  a  consistent  estimate  of  the  number  of  sig¬ 
nals.  while  the  criterion  introduced  by  Akaike  yields  an 
inconsistent  estimate  that  tends,  in  the  large-sample 
limit,  to  overestimate  the  number  of  signals.  Simulation 
results  that  illustrate  the  performance  of  the  new 
method  are  described  in  section  VI. 


I.  Introduction 

In  many  problems  in  signal  processing,  the  vector  of 
observations  can  be  modeled  as  a  linear  combination  of  a 
finite  number  of  signals  that  are  contaminated  by  addi¬ 
tive  noise.  This  is  the  case,  for  example,  in  the  harmonic 
retrieval  problem  [13],  in  the  array  processing  problem 
[12],  and  in  the  problem  of  retrieving  the  poles  of  a  sys¬ 
tem  from  the  natural  response  [15].  A  key  issue  in  these 
problems  is  that  of  determining  the  number  of  signals. 

A  promising  approach  to  the  probK  i  is  based  on 
the  observation  that  the  number  of  signals  can  be  deter¬ 
mined  from  the  multiplicity  of  the  smallest  eigenvalue 
of  the  covariance  matrix  of  the  observation  vector.  The 
conventional  method  used  for  determining  this  multipli¬ 
city  is  based  on  a  procedure  known  as  the  Bartlett- 
Lawley  test.  This  procedure  takes  the  form  of  a  sequence 
of  hypothesis  tests  for  the  multiplicity  of  the  smallest 
eigenvalue.  For  each  hypothesis,  the  likelihood  ratio 
statistic  is  compared  to  a  threshold.  The  hypothesis 
accepted  is  the  first  one  for  which  the  threshold  is 
crossed.  The  problem  with  this  method  is  the  subjective 
judgment  required  for  deciding  on  the  threshold  level. 

In  this  paper  we  present  a  new  method  for  deter¬ 
mining  the  number  of  signals.  It  is  based  on  the  applica¬ 
tion  of  the  information  theoretic  criteria  for  model 
identification  introduced  by  Akaike  (A1C),  and  by 
Schwartz  and  Rissanen  (MDL).  The  advantage  of  this 
method  is  that  no  subjective  judgement  is  required  in 
the  decision  process;  the  number  of  signals  is  deter¬ 
mined  as  the  value  for  which  the  chosen  criterion  is 
minimized. 

The  problem  is  formulated  in  section  II.  The  infor¬ 
mation  theoretic  criteria  for  model  identification  are 
Th-.s  work  was  supported  in  part  by  the  Air  Force  OT.ce  of  Fc.entific 
■lesearcri.  A.r  rorce  Systems  Command  under  Contract  A.-'4!>-S30-7<>-(.- 
CObfl,  the  C  S.  Army  keacarcn  Office,  under  Contract  :)AACZ9-7b-C-00in. 
and  sy  the  Joint  Ferr-ce*  Frog-am  a:  Ruioiord  University  under  Con- 
-,rsct  :)AA02»SI-K-00''7 


II.  Formulation  of  the  Problem 

Certain  important  problems  in  signal  processing 
such  as  harmonic  retrieval,  array  processing,  and  pole 
retrieval  from  the  natural  response,  have  identical  struc¬ 
ture;  their  observation  vector  can  be  expressed  as 

r(0  =  £A(0,)st(0  *  n<0  (1) 

i«l 

where  w,()  -  the  i-th  signal  -  i,  ■  zero. mean  complex 

random  process.  A(0,)  (i  =  1 . q  )  is  a  pxl  complex 

vector,  determined  by  the  dxl  parameter  vector  9, 
associated  with  the  i-th  signal,  and  n(  )  is  a  pxl  com¬ 
plex  vector  of  additive  white  noise.  We  use  complex  (ana¬ 
lytic  signal)  representation  since  this  is  the  natural 
representation  for  the  problems  of  interest. 

A  key  issue  in  the  generic  model  described  in  (1)  is 
that  of  determining  the  number  of  signals  q  from  the 

observed  data  r(tk)  (*=  1 . N  ). 

A  promising  approach  to  this  problem  is  based  on 
the  analysis  of  the  eigenstructure  of  the  covariance 
matrix  of  the  observation  vector  r(  ).  To  introduce  this 
approach,  let  us.  first,  rewrite  (1)  as 

r(t)  =  A*(f )  +  n(f )  (2  a) 

where  A  is  the  pxq  matrix 

A  =  [  A(9,).  .  .  A(9f)  ]  (2. b) 

and  s(f)  is  the  7x1  vector 

•  r(t)  =  [*,(!)  .  ..  s,(f)]  (2.c) 

The  covariance  matrix  of  r(f)  .  assuming  that  the  sig¬ 
nals  and  noises  are  uncorrelated,  is  given  by 

R  =  +  +  o*I  (3. a) 

where 

+  =  ASA*  (3.b) 


(rravi*^  -  riroK-'fe-opjy 
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with  f  denoting  the  conjugate  transpose,  and  S  denot¬ 
ing  the  covariance  matrix  of  the  sources,  i.e.. 

S  =  *[■(!  )«*(01  (3,c) 

We  assume  that  the  following  conditions  hold: 

(I)  S  is  nonsingutar,  i.e.,  the  signals  are  noncoherent. 

(II)  The  matrix  A  is  of  full  column  rank,  i.e.,  the  vec¬ 
tors  A(9,)  (i  =1 . q  )  &re  linearly  independent. 

Under  the  above  assumptions,  it  follows  that  the 

rank  of  +  is  q,  that  is.  its  p-q  smallest  eigenvalues 
are  equal  to  zero.  Denoting  the  eigenvalues  of  R  by 
ii»Aj  ii,  it  follows,  therefore,  that  the  small 
p-q  eigenvalues  of  R  are  equal  to  a2  ,i.e„ 

» i  =  \  »a  =  •  =  Xy  =  o*  (4) 

The  rank  q  can  hence  be  determined  from  the  multipli¬ 
city  of  the  smallest  eigenvalue  of  R. 

The  conventional  approach  to  the  problem,  pro¬ 
posed  by  Bartlett  (1954)  and  Lawley  (1956),  is  based  on  a 
sequence  of  hypothesis  tests  for  the  multiplicity  of  the 
smallest  eigenvalue.  The  problem  associated  with  this 
approach  is  the  subjective  judgement  required  in  choos¬ 
ing  the  threshold  levels. 

In  this  paper  we  lake  a  different  approach.  Given 

the  observations  r(f,) . r{tN).  we  try  to  determine 

which  of  the  family  covariance  matrices 

R*  =  +»  +  I  (5) 

where  'trk  is  a  semi-positive  matrix  of  rank  k  and  o  is 
an  unknown  scalar,  best  fits  the  data. 

Posed  in  this  way.  it  is  clear  that  the  problem  is  that 
of  model  identification,  and  therefore  the  information 
theoretic  criteria  for  model  identification  can  be  applied. 


III.  Information  Theoretic  Criteria 

The  first  objective  procedure  for  model 
identification  was  proposed  by  Akaike  (1973)  (1974). 
When  there  are  several  competing  models,  this  pro¬ 
cedure  selects  the  model  which  gives  the  minimum  A1C, 
defined  by 

maximum  number  of  free 

AIC=-21og,ik  ...  .  4*2  adjusted  parameters  (B) 

^lKeunooa  within  the  model 

The  first  term  is  the  well-known  log-likelihood  of  the 
maximum  likelihood  estimates  of  the  parameters  of  the 
model.  The  second  term  is  a  bias  correction  term, 
inserted  so  as  to  make  the  AiC  an  estimate  of  the  mean 
Kulback-Liebler  distance  between  the  true  distribution 
and  the  estimated  distribution,  determined  by  the 
maximum-likelihood  method. 

Inspired  by  Akaike's  pioneering  work,  Schwartz 
(1978)  and  Rissanen  (1978)  approached  the  problem 
from  quite  different  points  of  view.  Schwartz's  approach 
is  based  on  Bayesian  arguments.  He  assumed  that  each 
competing  model  can  be  assigned  a  prior  probability, 
and  proposed  to  select  the  model  that  yields  the  max¬ 
imum  posterior  probability.  Rissanen's  approach  is 
based  on  information  theoretic  arguments.  Since  each 
model  can  be  used  to  encode  the  observed  sequence, 
Rissanen  proposed  to  select  the  model  that  yields  the 
minimum  code  length  of  the  observed  data.  It  turns  out 
that  in  the  large-sample  limit,  both  Schwartz's  and 
Rissanen's  approaches  yield  the  same  criterion,  given  by 


maximum]  j  f  number  of  free 
MDL= -log  ^adjusted  parameters  log,V  (7) 

^likelihood  J  2  [  within  the  model 

where  N  denotes  the  number  of  observations.  We  call 
the  criterion  MDL  (for  Minimum  Description  Length)  as 
done  in  Rissanen  (19B3).  since  Rissanen's  derivation  is 
more  general:  Schwartz's  derivation  is  restricted  to  the 
case  that  the  observations  are  independent  and  come 
from  an  exponential  distribution. 


IV.  Determining  the  Number  of  Signals 


To  apply  the  information  theoretic  criteria  to  deter¬ 
mine  the  number  of  signals,  or  equivalently,  to  deter- 

mins  ihs  rank  of  tho  matrix  ^  ,  ws  must  first 

parameterize  the  model.  Using  the  well-known  spectral 
representation  theorem,  we  can  express  R*  as 

r*  =  t  (Vi  -  +  o*i  (b) 

<*i 

where  A*  j  .....A**  and  Y*t,  .  .  .  ,VkJt  are  the  eigen¬ 
values  and  eigenvectors,  respectively,  of  R*.  Denoting 
by  the  vector  of  the  parameters  of  the  model,  it  fol¬ 
lows  that 

*1  =  (  Vi . Xt  k,0 . Vis)  (9) 


With  this  parameterization,  we  can  now  derive  the 
maximum  likelihood  estimates  ol  the  parameters. 
Assuming  that  the  observations  r,.  .  .  .  r,v  are 
independently  and  identically  distributed  as  iVp(O.R).  it 
follows  that,  up  to  a  constant,  the  log-likelihood  is  given 

by 

/,(*,)  =  -AHog  det  R.  -  tr  R,-'R  (lO.a) 
where  R  is  the  sample-covariance  matrix  defined  by 

R  =  jrt  r(f,)r(f,)T  ( i o. b ) 

/V  1  =  1 


The  maximum-likelihood  (ML)  estimate  of  is  the 

value  of  ♦*  that  maximizes  Z,(«t>fc ).  Following  Anderson 
(1963),  the  ML  estimates  are  given  by 

V,  =  k  i  =  1 . *  (11.  a) 


(11. b) 


v*.,  =  c,  i  =  ! . k  (H.c) 


where  l|  >1,  •  •  ■  >  Ip  and  Cj . Cp  are  the  eigen¬ 

values  and  eigenvectors  ,  respectively,  of  the  sample 
covariance  matrix  R. 


Substituting  the  maximum  likelihood  estimates  (11) 
in  the  log-likelihood  (10),  we  obtain,  after  some  algebra 

fmairimiim]  f  ^  l 


—5 —  *  t, 

p—k  , j 


likelihood 


The  number  of  independently  adjusted  parameters 
in  the  model  is  obtained  by  counting  the  number  of 
degrees  of  freedom  of  the  space  spanned  by  the  parame¬ 
ter  vector  ♦».  Recalling  that  the  eigenvalues  of  a  com¬ 
plex  covariance  matrix  are  real,  but  that  the  eigenvec¬ 
tors  are  complex,  it  follows  that  has  k  ♦  1  ♦  2pk 
parameters.  However,  not  all  of  the  parameters  are 
independently  adjusted;  the  eigenvectors  are  con¬ 
strained  to  have  unit  norm  and  to  be  mutuallv 
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orthogonal.  Tbil  amount*  to  reduction  of  2k  degrees 
of  freedom  due  to  the  normalization  and  2  ^-fc(*-l) 

degrees  of  freedom  due  to  the  mutual  orthogonalization. 
Thus,  we  obtain 


number  of  free  f ,  1 

adjusted  parameters  =  k  +  l+2p*  -  4*(fc-l)| 
,  within  the  model  Is  J 


=  fc(2p-*)+l  (13) 


The  form  of  A1C  for  this  problem  is  therefore  given  by 


ft  k 

<•*  M 

1  A  . 

B-*  £  *» 

P  K  4**M 

AIC(fc)=-21og 


while  the  MOL  criterion  is  given  by 

jak±l 


+2k(2p-k)  (14) 


MDL(Jfc)=-log 


iTT 


p-k 


+  |*(2p-fc)logiV  (15) 


The  implementation  of  these  criteria  is  as  follows.  First, 
the  chosen  criterion  is  computed  for  every  possible  k. 

that  is.  k  -  0.1 . p  —  1.  The  rank  of  +  is  then 

determined  as  the  value  of  k  for  which  the  criterion  is 
minimized. 


V.  Consistency  of  the  MDL  Criterion 

The  concept  of  consistency  is  fundamental  in  sta¬ 
tistical  inference.  In  our  problem,  consistency  means 
convergence  of  the  selection  criterion  to  the  true  rank 
q  in  the  large-sample  limit.  We  shall  show,  by  generaliz¬ 
ing  a  method  of  proof  given  in  Rissanen  (I960)  and  Han¬ 
nan  and  Quinn  (1979),  that  the  MDL  yields  a  consistent 
estimate,  and  that  the  AIC  yields  an  inconsistent  esti¬ 
mate  that  tends,  in  the  large-sample  limit,  to  overesti¬ 
mate  the  true  rank. 

The  consistency  of  the  MDL  is  proved  by  showing 

that  in  the  large-sample  limit,  the  criterion  MDL(k)  is 

minimized  for  the  true  rank  k  -  q.  Taking  first  k  <q, 
it  follows  from  (15)  that 


jf{  MDL(q)  -  MDL(k)]  = 


Now,  it  is  well-known  that  the  eigenvectors  of  the 
sample-covariance  matrix  t,  (i=l . p)  arc 


consistent  estimates  of  the  eigenvectors  of  the  true 
covariance  matrix  A,.  Thus,  in  the  large-sample  limit 

the  eigenvectors  1,  (t=fc  +  l . q  )  are  not  all  equal 

with  probability  one.  Therefore,  by  the  arithmetic-mean 
geometric-mean  inequality,  it  follows  that  in  the  large- 
sample  limit 


Thus,  the  first  term  in  (16)  is  negative  with  probability 
one  in  the  large-sample  limit.  Similarly,  by  the  general¬ 
ized  arithmetic-mean  geometric-mean  inequality 

to, A,  +  10,+™,=  1  (18) 

it  follows  that 


£  1  > 

i 

f*  1  j 

L± 

p-k 

1  £  , 

L  H  > 

P-k  J 

L  H 

„  -k  ^ 

[9  K  «**» i 

k 

|r-t 


(19) 


Thus,  the  second  term  in  (18)  is  also  negative  with  proba¬ 
bility  one  in  the  large-sample  limit.  Now.  since  the  last 
term  in  (18)  goes  to  zero  as  the  sample  size  increases, 
the  difference  [  MDL{q)  —  MDL(k)  ]  is  negative  with 
probability  one  in  the  large-sample  limit,  for  k  <  q . 

Taking  now  k  >  q .  it  follows  from  (15)  that 
2[  MDL(k)  -  MDL(q)  ]  = 


-21og 

ft  «. 

<=*♦1 

N 

+21og 

ft  i. 

i~k  *  1 

N 

P  9 4=7*1 

P  Ki*k+l 

+  (*  -g)(Zp-*:-g  +  l)Iog7V 


(20) 


Note  that  the  terms  in  the  curly  brackets  are  twice  the 
log-likelihoods  of  the  maximum  likelihood  estimator 
under  the  hypotheses  that  the  rank  of  +  is  q  and  k, 
respectively.  Thus,  their  difference  is  the  likelihood-ratio 
for  deciding  between  these  two  hypothesises.  From  the 
general  theory  of  likelihood  ratios  (see  e.g  Cox  and  Hink- 
ley  (1974))  it  follows  that  the  asymptotic  distribution  of 
this  statistic  is  with  number  of  degrees  of  freedom 
equal  to  the  difference  of  the  dimensions  of  the  parame¬ 
ter  spaces  under  the  two  hypothesises,  i.e., 

[  fc(2p-*)+l  ]  -  [?(2p-g)+l  ]  =  (k  ~q)(Zp -k -q  +  1) 

Thus,  as  the  sample  size  increase,  the  probability  that 
the  term  in  the  curly  brackets  in  (18)  exceeds  the  last 
term  in  (!B),  is  given  by  the  area  in  the  tail  from 
(fc  ~q  )(2p  —k  —q  +  lJlogAl  of  the  mentioned  distribu¬ 
tion  with  (k  —  q  )(2p  -k  —q  + 1)  degrees  of  freedom. 
Since  the  area  in  this  tail  approaches  zero  as  the  sample 
size  increases,  it  follows  that  in  the  large-sample  limit 
the  difference  [  HDL(k)  -  MDL(q)  ]  is  positive  with 
probability  one  for  k  >q.  Combining  this  with  the 
previous  result  for  k<q,  it  follows  that  MDL(k)  has 
a  minimum  at  it  =  q . 

Repeating  the  above  arguments  for  the  AIC,  it  fol¬ 
lows  that  in  the  large-sample  limit  [  AIC{q )  -  AIC{k)  ) 
is  negative  with  probability  one  for  k  <  q.  However,  for 
k>q  .  [  AIC(k)  -  AIC(q)  ]  has  non-zero  probability 
to  be  negative  even  in  the  large-sample  limit,  since  the 
tail  of  the  probability  distribution  from 
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(k~q)(2p-k-q +  1)  of  the  mentioned  x*  distribution 
with  (k-q)(2p-k- q  +  1)  degrees  of  freedom,  is  no/ 
zero  in  this  case.  Hence,  the  A1C  tends,  in  the  targe- 
sample  limit,  to  overestimate  the  rank  q . 


VII.  Simulation  Results 

In  this  section  we  present  simulation  results  that 
illustrate  the  performance  of  the  proposed  method.  The 
examples  are  taken  from  array  processing,  though,  by 
the  well-known  duality  between  spatial  frequency  and 
temporal  frequency,  they  can  also  be  interpreted  in  the 
context  of  harmonic-retrieval.  The  examples  refer  to  a 
uniform  linear  array  of  p  sensors,  with  q  incoherent 
sinusoidal  plane-waves  impinging  from  directions 
The  received  signal  at  the  t'th  sensor 
(i  =  1 . p)  is  thus  given  by 

r,(/)  =  +.  n{f )  (21) 

•  ■i 

where 

Pit  =  random  phase  uniformly  distributed  on  (0,2rr) 
n(  )=  white  noise  with  mean  zero  and  variance  a2 
Note  that  this  model  is  a  special  case  of  the  generic 
model  presented  in  (1). 

In  the  first  example,  we  considered  an  array  with 
seven  sensors  (p  =  7)  and  two  sources  (q  =  2).  The 

signal-to-noise  ratio,  defined  as  10  log  — — was  12 dB 

2  c* 

and  the  directions-of-arrivals  were  20°  and  23°.  Using 
N  -  100  samples,  the  values  of  the  A1C  obtained  by  (14). 
are  given  by 


AIC(O) 

AIC(l) 

A1C(2) 

AIC|3) 

AIC(-I) 

A1C(5) 

AIC(6) 

1015.6 

45.3 

43.1 

466 

51.1 

51.6 

55.0 

Clearly,  the  minimum  value  of  the  A1C  is  obtained  for  the 

,  =2. 

In  the  second  example,  we  added  another  source  at 
-5°  to  the  scenario  described  in  the  first  example,  and 
raised  the  stgnal-to-nolse  ratio  to  80 dB.  The  results 
obtained  in  this  case  were 


AIC(0) 

A1C(1) 

AIC(S) 

AIC(3) 

AJC(t) 

AIC(S| 

AIC(«) 

2821.5 

162.4 

51.0 

408 

50.0 

51.0 

550 

Note  that  the  minimum  A1C  is  obtained,  correctly,  for 
q  -  3.  The  two  examples  above  demonstrate  clearly  the 
ability  of  the  new  procedure  to  detect  correctly  the 
number  of  sources  even  in  relatively  difficult  scenario  of 
two  closely  spaced  sources  and  low  signal-to-noise  ratio. 


VII.  Concluding  Remarks 

The  method  we  have  described  is  a  time-domain 
method  since  it  is  based  on  the  processing  of  the 
covariance  matrix  of  the  observation  vector.  In  some 
cases,  especially  in  array  processing,  the  frequency- 
domain  is  more  natural.  The  extension  of  the  method  to 
the  frequency  domain,  that  is  to  the  problem  of  deter¬ 
mining  the  number  of  signals  from  the  spectral-density 
matrix  of  the  observation  vector  is  presented  in  [14]. 
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